Generalized Clustering Methods for Multivariate Data

نویسندگان

Kishore R. Mosaliganti

Tony Pan

Dan Cowden

Raghu Machiraju

Joel Saltz

چکیده

Efficient analysis (including segmentation and classification) of multivariate data is an inherently complex task in which features occur as salient members of clusters in a multi-dimensional data space. The clusters assume a variety of distributions and frequently overlap, leading to difficulty in segmentation and classification. In many cases, similar but distinct features in the dataset are grouped into the same cluster in global classification. Additionally, partial voluming effects from the acquisition process complicate the reconstruction of accurate spatial features. In this paper, we present a novel framework for the analysis of multivariate datasets. We employ simple linear material models under partial voluming assumption, and recursively classify data samples, thus obtaining both global and regional segmentation in feature space. The simplicity of our approach makes it computationally more efficient compared to other methods for classification of multivariate data. We present results that employ data from light microscopy scanners and the Visible Human repository. CR Categories: K.6.1 [Management of Computing and Information Systems]: Project and People Management—Life Cycle; K.7.m [The Computing Profession]: Miscellaneous—Ethics

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering

Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...

متن کامل

Missing data imputation in multivariable time series data

Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics, astronomy, geography and finance. Many time series datasets contain missing data. Multivariate time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of diffe...

متن کامل

Choosing the Best Hierarchical Clustering Technique Based on Principal Components Analysis for Suspended Sediment Load Estimation

1- INTRODUCTION The assessment of watershed sediment load is necessary for controling soil erosion and reducing the potential of sediment production. Different estimates of sediment amounts along with the lack of long-term measurements limits the accessibility to reliable data series of erosion rate and sediment yield. Therefore, the observed data of suspended sediment load could be used to ...

متن کامل

Using multivariate generalized linear latent variable models to measure the difference in event count for stranded marine animals

BACKGROUND AND OBJECTIVES: The classification of marine animals as protected species makes data and information on them to be very important. Therefore, this led to the need to retrieve and understand the data on the event counts for stranded marine animals based on location emergence, number of individuals, behavior, and threats to their presence. Whales are g...

متن کامل

Robust Experimental Design for Multivariate Generalized Linear Models

A simple heuristic is proposed for constructing robust experimental designs for multivariate generalized linear models. The method is based on clustering a set of local optimal designs. A method for finding local D-optimal designs using available resources is also introduced. Clustering, with its simplicity and minimal computation needs, is demonstrated to outperform more complex and sophistica...

متن کامل

Repeated Record Ordering for Constrained Size Clustering

One of the main techniques used in data mining is data clustering, which has many applications in computer science, biology, and social sciences. Constrained clustering is a type of clustering in which side information provided by the user is incorporated into current clustering algorithms. One of the well researched constrained clustering algorithms is called microaggregation. In a microaggreg...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Generalized Clustering Methods for Multivariate Data

نویسندگان

چکیده

منابع مشابه

An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering

Missing data imputation in multivariable time series data

Choosing the Best Hierarchical Clustering Technique Based on Principal Components Analysis for Suspended Sediment Load Estimation

Using multivariate generalized linear latent variable models to measure the difference in event count for stranded marine animals

Robust Experimental Design for Multivariate Generalized Linear Models

Repeated Record Ordering for Constrained Size Clustering

عنوان ژورنال:

اشتراک گذاری